Journal article
d-blink: Distributed End-to-End Bayesian Entity Resolution
NG Marchant, A Kaplan, DN Elazar, BIP Rubinstein, RC Steorts
Journal of Computational and Graphical Statistics | TAYLOR & FRANCIS INC | Published : 2021
Abstract
Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers. A major advancement in ER methodology has been the application of Bayesian generative models, which provide a natural framework for inferring latent entities with rigorous quantification of uncertainty. Despite these advantages, existing models are severely limited in practice, as standard inference algorithms scale quadratically in the number of records. While scaling can be managed by fitting the model on separate blocks of the data, such a naïve approach may induce significant error in the posterior. In this article, we propose a pri..
View full abstractRelated Projects (1)
Grants
Awarded by National Science Foundation
Funding Acknowledgements
N. Marchant acknowledges the support of an Australian Government Research Training Program Scholarship and the AMSIIntern program hosted by the Australian Bureau of Statistics. R. C. Steorts and A. Kaplan acknowledge the support of NSF SES-1534412 and CAREER-1652431. B. Rubinstein acknowledges the support of Australian Research Council grant DP150103710. N. Marchant and B. Rubinstein also acknowledge support of Australian Bureau of Statistics project ABS2018.363.